# Video-Language Pretraining

Languagebind Video Huge V1.5 FT
MIT
LanguageBind is a pretrained model that achieves multimodal semantic alignment through language, capable of binding various modalities such as video, audio, depth, and thermal imaging with language to enable cross-modal understanding and retrieval.
Multimodal Alignment Transformers
L
LanguageBind
2,711
4
Languagebind Video V1.5 FT
MIT
LanguageBind is a language-centric multimodal pretraining method that uses language as the bond between different modalities to achieve multimodal semantic alignment.
Multimodal Alignment Transformers
L
LanguageBind
853
5
Languagebind Audio FT
MIT
LanguageBind is a language-centric multimodal pretraining method that achieves semantic alignment by using language as the bridge between different modalities.
Multimodal Alignment Transformers
L
LanguageBind
12.59k
1
Languagebind Video FT
MIT
LanguageBind is a language-centric multimodal pretraining method that uses language as the bond between different modalities to achieve semantic alignment across video, infrared, depth, audio, and other modalities.
Multimodal Alignment Transformers
L
LanguageBind
22.97k
4
Languagebind Video Merge
MIT
LanguageBind is a multimodal model that extends video-language pretraining to N modalities through language-based semantic alignment, accepted by ICLR 2024.
Multimodal Alignment Transformers
L
LanguageBind
10.96k
4
Languagebind Image
MIT
LanguageBind is a language-centric multimodal pretraining method that uses language as the bond between different modalities to achieve semantic alignment.
Multimodal Alignment Transformers
L
LanguageBind
25.71k
11
Languagebind Depth
MIT
LanguageBind is a language-centric multimodal pretraining method that uses language as the bond between different modalities to achieve semantic alignment across video, infrared, depth, audio, and other modalities.
Multimodal Alignment Transformers
L
LanguageBind
898
0
Languagebind Video
MIT
LanguageBind is a multimodal pretraining framework that extends video-language pretraining to N modalities through language semantic alignment, accepted by ICLR 2024.
Multimodal Alignment Transformers
L
LanguageBind
166
2
Languagebind Thermal
MIT
LanguageBind is a pretraining framework that achieves multimodal semantic alignment through language as the bond, supporting joint learning of various modalities such as video, infrared, depth, and audio with language.
Multimodal Alignment Transformers
L
LanguageBind
887
1
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase